example usage
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhang, Zhong, Lu, Yaxi, Fu, Yikun, Huo, Yupeng, Yang, Shenzhi, Wu, Yesai, Si, Han, Cong, Xin, Chen, Haotian, Lin, Yankai, Xie, Jie, Zhou, Wei, Xu, Wang, Zhang, Yuanheng, Su, Zhou, Zhai, Zhongwu, Liu, Xiaoming, Mei, Yudong, Xu, Jianming, Tian, Hongyan, Wang, Chongyi, Chen, Chi, Yao, Yuan, Liu, Zhiyuan, Sun, Maosong
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, which hinders the learning of precise grounding and planning. Models trained purely by imitation tend to overfit to seen interface patterns and fail to generalize in unfamiliar scenarios. Moreover, most prior work focuses on English interfaces while overlooks the growing diversity of non-English applications such as those in the Chinese mobile ecosystem. In this work, we present AgentCPM-GUI, an 8B-parameter GUI agent built for robust and efficient on-device GUI interaction. Our training pipeline includes grounding-aware pre-training to enhance perception, supervised fine-tuning on high-quality Chinese and English trajectories to imitate human-like actions, and reinforcement fine-tuning with GRPO to improve reasoning capability. We also introduce a compact action space that reduces output length and supports low-latency execution on mobile devices. AgentCPM-GUI achieves state-of-the-art performance on five public benchmarks and a new Chinese GUI benchmark called CAGUI, reaching $96.9\%$ Type-Match and $91.3\%$ Exact-Match. To facilitate reproducibility and further research, we publicly release all code, model checkpoint, and evaluation data.
Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
Wu, Zongru, Cheng, Pengzhou, Wu, Zheng, Ju, Tianjie, Zhang, Zhuosheng, Liu, Gongshen
Perception-enhanced pre-training, particularly through grounding techniques, is widely adopted to enhance the performance of graphical user interface (GUI) agents. However, in resource-constrained scenarios, the format discrepancy between coordinate-oriented grounding and action-oriented reasoning limits the effectiveness of grounding for reasoning tasks. To address this challenge, we propose a query-oriented pivot approach called query inference, which serves as a bridge between GUI grounding and reasoning. By inferring potential user queries from a screenshot and its associated element coordinates, query inference improves the understanding of coordinates while aligning more closely with reasoning tasks. Experimental results show that query inference outperforms previous grounding techniques under the same training data scale. Notably, query inference achieves comparable or even better performance to large-scale grounding-enhanced OS-Atlas with less than 0.1% of training data. Furthermore, we explore the impact of reasoning formats and demonstrate that integrating additional semantic information into the input further boosts reasoning performance. The code is publicly available at https://github.com/ZrW00/GUIPivot.
Detection of Non-recorded Word Senses in English and Swedish
Lautenschlager, Jonathan, Sköldberg, Emma, Hengchen, Simon, Schlechtweg, Dominik
This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations to adapt and evaluate our models. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.
LILO: Learning Interpretable Libraries by Compressing and Documenting Code
Grand, Gabriel, Wong, Lionel, Bowers, Matthew, Olausson, Theo X., Liu, Muxin, Tenenbaum, Joshua B., Andreas, Jacob
While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.
ChengBinJin/MRI-to-CT-DCNN-TensorFlow
This repository is an implementation of "MR‐based synthetic CT generation using a deep convolutional neural network method." This toy dataset just includes 367 paired images. We randomly divide data into training, validation, and test. Use main.py to train a DCNN model. Use main.py to test the DCNN model.
deepmind/deepmind-research
This repository contains the trained model and dataset used for Unsupervised Adversarial Training (UAT) from the paper Are Labels Required for Improving Adversarial Robustness? Our model is available via TF-Hub. For example usage, refer to quick_eval_cifar.py. The preferred method of running this script is through run.sh, which will set up a virtual environment, install the dependendencies, and run the evaluation script, which will print the adversarial accuracy of the model. Note this file is very large, and requires 227 GB of disc space.
BenWhetton/keras-surgeon
Keras-surgeon provides simple methods for modifying trained Keras models. Keras-surgeon is compatible with any model architecture. Any number of layers can be modified in a single traversal of the network. These kinds of modifications are sometimes known as network surgery which inspired the name of this package. The operations module contains simple methods to perform network surgery on a single layer within a model.